Estimating Semantic Distance Using Soft Semantic Constraints in Knowledge-Source - Corpus Hybrid Models

نویسندگان

  • Yuval Marton
  • Saif Mohammad
  • Philip Resnik
چکیده

Strictly corpus-based measures of semantic distance conflate co-occurrence information pertaining to the many possible senses of target words. We propose a corpus–thesaurus hybrid method that uses soft constraints to generate word-senseaware distributional profiles (DPs) from coarser “concept DPs” (derived from a Roget-like thesaurus) and sense-unaware traditional word DPs (derived from raw text). Although it uses a knowledge source, the method is not vocabularylimited: if the target word is not in the thesaurus, the method falls back gracefully on the word’s co-occurrence information. This allows the method to access valuable information encoded in a lexical resource, such as a thesaurus, while still being able to effectively handle domainspecific terms and named entities. Experiments on word-pair ranking by semantic distance show the new hybrid method to be superior to others.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Maximum Entropy Approach for Semantic Language Modeling

The conventional n-gram language model exploits only the immediate context of historical words without exploring long-distance semantic information. In this paper, we present a new information source extracted from latent semantic analysis (LSA) and adopt the maximum entropy (ME) principle to integrate it into an n-gram language model. With the ME approach, each information source serves as a s...

متن کامل

Enhancing Document Clustering Using Hybrid Models for Semantic Similarity

Different document representation models have been proposed to measure semantic similarity between documents using corpus statistics. Some of these models explicitly estimate semantic similarity based on measures of correlations between terms, while others apply dimension reduction techniques to obtain latent representation of concepts. This paper proposes new hybrid models that combine explici...

متن کامل

Cross-Lingual Distributional Profiles of Concepts for Measuring Semantic Distance

We present the idea of estimating semantic distance in one, possibly resource-poor, language using a knowledge source in another, possibly resource-rich, language. We do so by creating cross-lingual distributional profiles of concepts, using a bilingual lexicon and a bootstrapping algorithm, but without the use of any sense-annotated data or word-aligned corpora. The cross-lingual measures of s...

متن کامل

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Title of dissertation: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models Yuval Marton, Doctor of Philosophy, 2009 Dissertation directed by: Professor Philip Resnik, Department of Linguistics and Institute for Advanced Computer Studies This dissertation focuses on effective combination of data-driven natural language processing (NLP) approaches with lingu...

متن کامل

Semantic Prosody: Its Knowledge and Appropriate Selection of Equivalents

In translation, choosing appropriate equivalent is essential to convey the right message from source-text to target-text, and one of the issues that may have a determinative role in appropriate equivalent choice is the semantic prosody (SP) behavior of words and the relation existing between the SP of a word and semantic senses (i.e. negativity, positivity or neutrality) of its collocations in ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009